Empirical process

The study of empirical processes is a branch of mathematical statistics and a sub-area of probability theory. It is a generalization of the central limit theorem for empirical measures. Applications of the theory of empirical processes arise in non-parametric statistics.

Contents

Definition

It is known that under certain conditions empirical measures P_n uniformly converge to the probability measure P (see Glivenko–Cantelli theorem). The theory of Empirical processes provides the rate of this convergence.

A centered and scaled version of the empirical measure is the signed measure

G_n(A)=\sqrt{n}(P_n(A)-P(A))

It induces a map on measurable functions f given by

f\mapsto G_n f=\sqrt{n}(P_n-P)f=\sqrt{n}\left(\frac{1}{n}\sum_{i=1}^n f(X_i)-\mathbb{E}f\right)

By the central limit theorem, G_n(A) converges in distribution to a normal random variable N(0, P(A)(1 − P(A))) for fixed measurable set A. Similarly, for a fixed function f, G_nf converges in distribution to a normal random variable N(0,\mathbb{E}(f-\mathbb{E}f)^2), provided that \mathbb{E}f and \mathbb{E}f^2 exist.

Definition

\bigl(G_n(c)\bigr)_{c\in\mathcal{C}} is called an empirical process indexed by \mathcal{C}, a collection of measurable subsets of S.
\bigl(G_nf\bigr)_{f\in\mathcal{F}} is called an empirical process indexed by \mathcal{F}, a collection of measurable functions from S to \mathbb{R}.

A significant result in the area of empirical processes is Donsker's theorem. It has led to a study of the Donsker classes such that empirical processes indexed by these classes converge weakly to a certain Gaussian process. It can be shown that the Donsker classes are Glivenko–Cantelli classes, the converse is not true in general.

Example

As an example, consider empirical distribution functions. For real-valued iid random variables X_1,X_n,\dots they are given by

F_n(x)=P_n((-\infty,x])=P_nI_{(-\infty,x]}.

In this case, empirical processes are indexed by a class \mathcal{C}=\{(-\infty,x]:x\in\mathbb{R}\}. It has been shown that \mathcal{C} is a Donsker class, in particular,

\sqrt{n}(F_n(x)-F(x)) converges weakly in \ell^\infty(\mathbb{R}) to a Brownian bridge B(F(x)) .

See also

References

External links